Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(llmobs): propagate distributed headers via signal dispatching, not config #12089

Merged
merged 6 commits into from
Jan 28, 2025

Conversation

Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Jan 24, 2025

This PR makes a change to our shared distributed tracing header injection method to dispatch signals/events instead of relying on the global config settings, which is only modifiable via env vars. This fixes distributed tracing for users that might rely solely on the LLMObs.enable() setup config.

Programmatic LLMObs.enable()/disable() calls do not set the global config._llmobs_enabled boolean setting, which is only controlled by the DD_LLMOBS_ENABLED env var. This was problematic for users that relied on manual LLMObs.enable() setup (i.e. no env vars) because our distributed tracing injection code only checks the global config to inject llmobs parent IDs into request headers. If users manually enabled LLMObs without any env vars, then this would not be reflected in the global config value and thus LLMObs parent IDs would never be injected into the request headers.

We can't check directly if LLMObs is enabled in the http injection module because:

  1. This would require us to import significant product-specific LLMObs-code into the shared http injector helper module which would impact non-LLMObs users' app performance
  2. Circular imports in LLMObs which imports http injector logic to use in its own helpers

Instead of doing our check based on the global config._llmobs_enabled setting, we now send a tracing event to our shared product listeners, and register a corresponding LLMObs._inject_llmobs_context() hook to be called for all inject() calls if LLMObs is enabled (we check the LLMObs instance, not the global config setting value).

One risk and why I don't like changing global config settings is because this then implies that it is no longer global or tied to an env var (I want to push for env var configuration where possible over manual overriding/enabling). If a global enabled config can be toggled indiscriminately then this could open a can of worms for enabling/disabling logic in our LLMObs service, which isn't really designed to be toggled on/off multiple times in the app's lifespan. However if some users cannot rely on env vars, then I don't see any other solution that does not couple tracer internal code with LLMObs code which is a no-option. (UPDATE: we avoided this issue by using signal dispatching)

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance policy

@Yun-Kim Yun-Kim requested a review from a team as a code owner January 24, 2025 23:26
Copy link
Contributor

github-actions bot commented Jan 24, 2025

CODEOWNERS have been resolved as:

releasenotes/notes/fix-llmobs-enable-updates-config-45379a7a30e2e0e3.yaml  @DataDog/apm-python
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/propagation/http.py                                             @DataDog/apm-sdk-api-python
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_propagation.py                                        @DataDog/ml-observability
tests/tracer/test_propagation.py                                        @DataDog/apm-sdk-api-python

@Yun-Kim Yun-Kim force-pushed the yunkim/llmobs-fix-enable branch from 6797009 to 79f185d Compare January 24, 2025 23:30
@Yun-Kim Yun-Kim requested a review from a team as a code owner January 24, 2025 23:30
@Yun-Kim Yun-Kim force-pushed the yunkim/llmobs-fix-enable branch from 79f185d to 883953c Compare January 24, 2025 23:33
@pr-commenter
Copy link

pr-commenter bot commented Jan 25, 2025

Benchmarks

Benchmark execution time: 2025-01-28 17:30:28

Comparing candidate commit aba113b in PR branch yunkim/llmobs-fix-enable with baseline commit 4f0bcb5 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics.

Copy link
Contributor

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for fixing! re:

We can't check directly if LLMObs is enabled in the http injection module

I'm not sure about the performance implications of core.dispatch, but I wonder if that could be of use, i.e., adding:

core.dispatch("tracer.span.inject", (headers, ))

from the HTTPPropagator.inject method and then adding our parent id to headers['x-datadog-tags']. not relevant to the fix here per se, but might make it easier for these injections to happen automatically thru auto-instrumented requests libraries instead of a user needing to call LLMObs.inject_distributed_headers directly. not sure of the validity here but maybe something we can follow up on 😄

@Yun-Kim Yun-Kim added the manual merge Do not automatically merge label Jan 27, 2025
@Yun-Kim Yun-Kim requested a review from a team as a code owner January 27, 2025 20:35
@Yun-Kim Yun-Kim changed the title fix(llmobs): toggle config when enabled/disabled programmatically fix(llmobs): propagate distributed headers via signal dispatching, not config Jan 27, 2025
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Jan 27, 2025

Datadog Report

Branch report: yunkim/llmobs-fix-enable
Commit report: aba113b
Test service: dd-trace-py

✅ 0 Failed, 130 Passed, 1378 Skipped, 4m 54.85s Total duration (35m 5.82s time saved)

@Yun-Kim Yun-Kim force-pushed the yunkim/llmobs-fix-enable branch from f8e5157 to db253b7 Compare January 27, 2025 21:52
Copy link
Contributor

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new implementation lgtm!

@Yun-Kim Yun-Kim merged commit c4448ea into main Jan 28, 2025
638 checks passed
@Yun-Kim Yun-Kim deleted the yunkim/llmobs-fix-enable branch January 28, 2025 22:11
Copy link
Contributor

The backport to 2.19 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.19 2.19
# Navigate to the new working tree
cd .worktrees/backport-2.19
# Create a new branch
git switch --create backport-12089-to-2.19
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c4448ea4e5580cfa55684ef108a5dafafe1f6f75
# Push it to GitHub
git push --set-upstream origin backport-12089-to-2.19
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.19

Then, create a pull request where the base branch is 2.19 and the compare/head branch is backport-12089-to-2.19.

Copy link
Contributor

The backport to 2.20 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-2.20 2.20
# Navigate to the new working tree
cd .worktrees/backport-2.20
# Create a new branch
git switch --create backport-12089-to-2.20
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c4448ea4e5580cfa55684ef108a5dafafe1f6f75
# Push it to GitHub
git push --set-upstream origin backport-12089-to-2.20
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-2.20

Then, create a pull request where the base branch is 2.20 and the compare/head branch is backport-12089-to-2.20.

Yun-Kim added a commit that referenced this pull request Jan 28, 2025
…t config (#12089)

This PR makes a change to our shared distributed tracing header
injection method to dispatch signals/events instead of relying on the
global config settings, which is only modifiable via env vars. This
fixes distributed tracing for users that might rely solely on the
`LLMObs.enable()` setup config.

Programmatic `LLMObs.enable()/disable()` calls do not set the global
`config._llmobs_enabled` boolean setting, which is only controlled by
the `DD_LLMOBS_ENABLED` env var. This was problematic for users that
relied on manual `LLMObs.enable()` setup (i.e. no env vars) because our
distributed tracing injection code only checks the global config to
inject llmobs parent IDs into request headers. If users manually enabled
LLMObs without any env vars, then this would not be reflected in the
global config value and thus LLMObs parent IDs would never be injected
into the request headers.

We can't check directly if LLMObs is enabled in the http injection
module because:
1. This would require us to import significant product-specific
LLMObs-code into the shared http injector helper module which would
impact non-LLMObs users' app performance
2. Circular imports in LLMObs which imports http injector logic to use
in its own helpers

Instead of doing our check based on the global `config._llmobs_enabled`
setting, we now send a tracing event to our shared product listeners,
and register a corresponding `LLMObs._inject_llmobs_context()` hook to
be called for all inject() calls if LLMObs is enabled (we check the
LLMObs instance, not the global config setting value).

~One risk and why I don't like changing global config settings is
because this then implies that it is no longer global or tied to an env
var (I want to push for env var configuration where possible over manual
overriding/enabling). If a global enabled config can be toggled
indiscriminately then this could open a can of worms for
enabling/disabling logic in our LLMObs service, which isn't really
designed to be toggled on/off multiple times in the app's lifespan.
However if some users cannot rely on env vars, then I don't see any
other solution that does not couple tracer internal code with LLMObs
code which is a no-option.~ (UPDATE: we avoided this issue by using
signal dispatching)

- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

- [x] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.20 manual merge Do not automatically merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants